Introduction to coreMicrobiome

Dan Lin

2022-04

1. Introduction


coreMicrobiome is a web-based R Shiny graphical user interface (GUI) with a R package for scientists without/with programming expertise to conduct explore and visualization of core microbial species which comprises four functional modules:

(1) Initial visualization of sampling effort and distribution of dominant bacterial taxa among groups or individual samples at different taxonomic levels;

(2) Analysis of Abundance-occupancy distribution and visualizations;

(3) Co-occurrence network construction, analysis, comparisons and visualizations;

(4) A combined visualization of abundance-occupancy distribution and co-occurrence network for understanding the core species from the common perspective and ecosystem perspective.

2. Overview of the coreMicrobiome analysis


jupyter

3. Loading packages


Let us first load the package from github.

Let us then some load necessary libraries.

4. Data preparation


We show the GlobalPatterns example workflow as initially outlined in (McMurdie and Holmes 2013).

We retrieve the example data in phyloseq format.

Let us load the data.

We only use the archaea data to reduce the data load.

5. coreMicrobiome


5.1 Basic plot

We plots abundance among samples and groups (sample_group) at specific taxonomic level.

We plots heatmap based on count data, relative abundance data, clr transformation data or log10 transformation data.

We plots species accumulation curve with boxplots indicating the 95% CI.

5.2 Abundance-occupancy analysis

We perform the abundanc-occupancy analysis as described in (Shade and Stopnisek 2019).

First, we plots BC similarity vs ranked otu.

Then, we observe the relationship between occupancy and abundance with color denoting and core or non-core otu.

Then, we plot the phylogenetic tree with a heatmap denoting the average occurrence frequency of each taxa with color denoting and core or non-core otu.

Then, we show the average occupancy and average relative abundance between core and non-core otu.

Finally, we use random forest to identify the importance of otu (core otu, non-core otu and all otu) on predicting/classifying the sample_group and display the accuracy.

5.3 Network analysis

We construct the co-occurrence networks with 3 optional compositional association methods on the filtered data (low occurrence frequency otu should be removed): propr, sparcc and cclasso, based on certain permutation number, FDR and association threshold.

In the example, we only construct the networks by propr and sparcc. Whey using the defined thresholds (FDR = 0.1, cor = 0.6), cclasso identifies no association.

Then, we calculate the node-level and network-level properties of these two networks and perform statistical testing.

Finally, we compare the common/shared nodes and edges between these two networks.

5.4 Combined plot

The function phnetworks implements a user friendly wrapper for visualization in abundance-occupancy analysis and network analysis.

  1. Plot phylogenetic tree showing the evolutionary relationship of all taxa.
  1. Plot heatmaps showing the average occurrence frequency and average relative abundance of each taxa among all sample groups.
  1. Plot the networks constructed by 3 optional compositional association methods: propr, sparcc and cclasso, which shows the co-occurrence role of each taxa.

6. Package versions


References


McMurdie, P. J., and S. Holmes. 2013. "Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data." Journal Article. PLoS One 8 (4): e61217. https://doi.org/10.1371/journal.pone.0061217.

Shade, A., Stopnisek, N. 2019. "Abundance-occupancy distributions to prioritize plant core microbiome membership." Journal Article. Current opinion in microbiology 49: 50-58. https://doi.org/10.1016/j.mib.2019.09.008